This assignment is for ETC5521 Assignment 1 by Team wallaby comprising of Helen Evangelina and Rahul Bharadwaj.

Introduction and Motivation

Music, in a broad sense, is any art composed of sound, but it can express people’s thoughts and thoughts, which implies the author’s life experience, thoughts and feelings, and can bring people the enjoyment of beauty and the expression of human feelings. At the same time, music is also a form of social behavior, through which people can exchange feelings and life experiences.

In ancient times, when the court held a banquet, or some talented people visited the landscape, they would play music to boost the fun. But in modern times, because the threshold of classical music is too high, and its development has gradually reached the extreme, it has become a very small group, while pop music (the general name of popular songs, including Rock, R&B, Latin, etc) is gradually showing its own characteristics. Therefore, modern songs are quietly occupying the top position in people’s hearts because of their outstanding performance in conveying emotion and life experience. Listening to pop music has also become the most common behavior in everyone’s daily entertainment.

Spotify is a legitimate streaming music service platform, which has been supported by Warner Music, Sony, EMI and other major record companies around the world. Now it has more than 60 million users, and it is the world’s leading large-scale online streaming music playing platform.

Because Spotify contains a large number of users’ data, four users who are very interested in it, Charlie Thompson, Josiah parry, Donal Phipps, and Tom Wolff decided to make it easier for everyone to know their own preferences or the mainstream of most people’s listening to songs through spotify’s API, thus creating Spotifyr package. In addition to Spotify package, our data is also mixed with blog post data created by Kaylin Pavlik. Six main categories (EDM, Latin, pop, R&B, rap, rock) are used to classify 5000 songs. The combination of the two data has a great effect on the study of the popularity of pop music.

Nowadays, music plays an important role in people’s life. It plays an indispensable role in helping people manage and improve their quality of life. As fans of music, we not only enjoy music, but also wonder how music strikes people’s hearts with simple tones, rhythms, timbres and words. How popular is each genre? How much influence does the genre, or the various attributes of songs, have on music popularity? Does it makes us dance or sing unconsciously, or does it convey our emotions and implicate our thoughts? The curiosity behind all these questions drives the purpose of this analysis.

Analysis Questions

By doing this exploratory data analysis, we want to know:

Primary Question: What audio features are capable of making an impact on the popularity of music artworks and contribute to the emergence of Top Songs?

Sub Questions:

  1. Since 1957, what are the audio features of those top artists who make the most music artworks?

  2. Explore our favorite artist - Coldplay’s works, e.g. how about the musical positiveness conveyed by their albums?

  3. There are plenty of modern music genres nowadays, What unique style or charm can stand out and become the first choice of people?

Questions Added to enhance the scope of the analysis:

  1. What exactly makes artists stand out even when there are artists doing the same kind of music? What is the Unique Selling Point (USP) of a few particular selected artists?

This helps us enhance the scope of the primary analysis and broadens our understanding of the relations between popularity and audio features.

Data Description

Data Source

Data Collection Methods:

  • Spotifyr package can extract track audio characteristics or other related information from Spotify’s Web API in batches. For example, if you want to search for an artist, just type in his name, and all his albums or songs will be listed in seconds.

  • Meanwhile, Spotifyr package will record the popularity metrics of all tracks or albums, so it is easy to understand the correlation between music popularity and music characteristics. Then, Jon Harmon and Neal Grantham extracted the Spotifr package and added the content of Kaylin Pavlik’s recent blogpost to divide the genre of nearly 5000 songs, thus generating the Tidytuesdayr package we need for this assignment.

  • We chose music works created by artists that can be found on Spotify from January 1, 1957 to January 29, 2020.

Data Structure

  • After reading the data on RStudio, our team used the glimpse() function to show the specific content and structure of the data. And here is a brief summary of the data structure:
## Rows: 32,833
## Columns: 24
## $ X                        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
## $ track_id                 <chr> "6f807x0ima9a1j3VPbc7VN", "0r7CVbZTWZgbTCY...
## $ track_name               <chr> "I Don't Care (with Justin Bieber) - Loud ...
## $ track_artist             <chr> "Ed Sheeran", "Maroon 5", "Zara Larsson", ...
## $ track_popularity         <int> 66, 67, 70, 60, 69, 67, 62, 69, 68, 67, 58...
## $ track_album_id           <chr> "2oCs0DGTsRO98Gh5ZSl2Cx", "63rPSO264uRjW1X...
## $ track_album_name         <chr> "I Don't Care (with Justin Bieber) [Loud L...
## $ track_album_release_date <chr> "2019-06-14", "2019-12-13", "2019-07-05", ...
## $ playlist_name            <chr> "Pop Remix", "Pop Remix", "Pop Remix", "Po...
## $ playlist_id              <chr> "37i9dQZF1DXcZDD7cfEKhW", "37i9dQZF1DXcZDD...
## $ playlist_genre           <chr> "pop", "pop", "pop", "pop", "pop", "pop", ...
## $ playlist_subgenre        <chr> "dance pop", "dance pop", "dance pop", "da...
## $ danceability             <dbl> 0.748, 0.726, 0.675, 0.718, 0.650, 0.675, ...
## $ energy                   <dbl> 0.916, 0.815, 0.931, 0.930, 0.833, 0.919, ...
## $ key                      <int> 6, 11, 1, 7, 1, 8, 5, 4, 8, 2, 6, 8, 1, 5,...
## $ loudness                 <dbl> -2.634, -4.969, -3.432, -3.778, -4.672, -5...
## $ mode                     <int> 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, ...
## $ speechiness              <dbl> 0.0583, 0.0373, 0.0742, 0.1020, 0.0359, 0....
## $ acousticness             <dbl> 0.10200, 0.07240, 0.07940, 0.02870, 0.0803...
## $ instrumentalness         <dbl> 0.00e+00, 4.21e-03, 2.33e-05, 9.43e-06, 0....
## $ liveness                 <dbl> 0.0653, 0.3570, 0.1100, 0.2040, 0.0833, 0....
## $ valence                  <dbl> 0.518, 0.693, 0.613, 0.277, 0.725, 0.585, ...
## $ tempo                    <dbl> 122.036, 99.972, 124.008, 121.956, 123.976...
## $ duration_ms              <dbl> 194754, 162600, 176616, 169093, 189052, 16...
  • The spotify_song is tabular data, which contains 24 columns and 32,833 rows. The variables and their types are as mentioned above.

Data Table -

A Visual Overview of the Data:

Visual Representation of the Dataset

Visual Representation of the Dataset

  • A picture speaks a thousand words. Thus, we represent the data in a simple and elegant visualization that describes the same column names and types described previously through text.

  • Since our analysis focuses on correlations between audio features, it is a good idea to have some overview as to how the numerical fields correlate.

A Visual Representation of the Correlation of numeric data

A Visual Representation of the Correlation of numeric data

  • The Visualization above shows how each numerical variable correlate among themselves. This gives us a basic understanding of how we can analyze for correlations.

Data Cleaning:

  • Now, we will clean the data, select the variables that are useful to our EDA, and retain six major music genres (the proportions of other genres are very low, which can be ignored). And then, we arrange the data from high to low according to track popularity.
Clean Data with necessary columns

Clean Data with necessary columns

  • The above figure gives an overview of the columns necessary for our analysis. This data is clean with less than 0.1% missing data and is ready for analysis.

Analysis and Findings

Top Artists

  • From the above table, we can see that Queen, Martin Garrix and The Chainsmokers occupy one, two and three places respectively. Also, we can see that there are many famous artists on the list, such as Drake, Maroon 5 or Ed Sheeran, etc.
Top 20 Artists who wrote the most songs from 1941 to 2020

Top 20 Artists who wrote the most songs from 1941 to 2020

  • Similarly the figure above shows the same in a bar plot. This will help to deepen our impression of the top 20 singers and have an intuitive understanding of the gap between them. Like mentioned previously, pictures speak a lot more than tables and information in text format.

  • We filter artists whose popularity is greater than 95, and then visualize it in the form of a radar plot. This way, the singers who are at the top can be clearly identified at a glance. At the same time, music lovers can know the characteristics of these top singers’ music artworks.

Characteristics of Top Singers

Characteristics of Top Singers

  • The height of each pie segment shows the level of popularity. The color intensity shows the energy levels of the songs by that artist and different colors represent genre. The blue outline describes the danceability of the songs. This way, we can perceive three audio features at the same time along with Track Popularity.

  • From the figure above, we can see that Maroon 5, the Weekend, Roddy Rich and KAROL G are overwhelming in popularity. Also, it is clear that popular singers usually create many genres of songs, which are not limited to a single genre.

  • Next, from the perspective of different artists’ music artworks style, they are filled with the great differences. For example, from the brightness of colors, we can see that the Energy brought by Maroon 5 and Billie Eilish’s music artworks is not too high. This is not to elaborate their shortcomings, but to elaborate their style, which is lyrical and soft. If judging from the color of each fan-shaped boundary line, it can be concluded that Roddy Rich and Trevor Daniel’s works have the highest value of danceability, after the comparison of each artworks’ average tempo, rhythm stability, beat strength, and overall regularity.

Analyzing our Favorite Artist - Coldplay

  • In this part, we want to take one artist for example to do some detailed exploratory analysis using the “spotifyr” package. Here we choose the Coldplay, our favorite artist.

  • First, we loaded all the albums of Coldplay available on spotify and dropped the duplicate ones (some live tour albums are duplicate with the existed ones). We calculated the average valence of each album. The results are shown in the following table.

The Musical Positiveness of Coldplay’s Albums
album_name valence
Everyday Life 0.30
Viva La Vida or Death and All His Friends 0.26
Mylo Xyloto 0.25
Parachutes 0.23
A Head Full of Dreams 0.23
X&Y 0.22
Ghost Stories 0.21
Love in Tokyo 0.19
A Rush of Blood to the Head 0.18
  • According to the spotify tracks documentation, The valence variable is measured from 0.0 to 1.0, describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). The highest valence of these albums is 0.3, and the lowest valence is 0.18, which means the songs of Coldplay usually sounds more negative than positive for the audience.

  • Second, we make a density plot to show the ranges and densities of valence of each album.

Valence Density of Coldplay Albums

Valence Density of Coldplay Albums

  • From the above figure, we can find that “Everyday Life” has the widest range of valence, that is to say, this album contains abundant emotions. Meanwhile, “A Rush of Blood to the Head” has a narrow range of valence, and the valence density centered at the area with lower valence values. It’s probably that the audience would feel negative emotions like sad, depressed and angry when they listening to this album. This finding surprised us because “A Rush of Blood to the Head” is the second best album in “The Coldplay Albums Ranked”. So we decided to look more in depth next.
The most frequent words in ‘A Rush of Blood to the Head’
word sentiment n
love positive 7
easy positive 4
fall negative 4
grace positive 4
miss negative 4
  • Lastly, we analyzed the sentiment of this album to see whether the valence of an album is associated with the lyrics. The average sentiment value of this album is -0.47 by the “afinn” lexicon. And we also analyzed the sentiment of lyrics using the “bing” lexicon. The above table shows the most frequent words and their sentiment in this album. In addition, the figure below shows more intuitively the frequency of words which appears more than once. We can easily find that the negative words appear more than the positive ones.

  • As a result, we can say for sure that, both in terms of sound and lyrics, this album conveyed negative emotions. But this doesn’t affect that people think “A Rush of Blood to the Head” is one of the best albums of Coldplay. It can be seen that the audience’s love for a album is not entirely determined by the album’s positiveness but rather how well they can relate and resonate to emotions in the songs. This is proof that people use music for all emotional experiences and not just to have fun or feel refreshed!

Analyzing the Audio Features

  • In this part, we analyzed the audio features of all the songs in our dataset. Here’s a simple explanation of these features:
    • acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic.

    • danceablity: Danceability describes how suitable a track is for dancing. A value of 0.0 is least danceable and 1.0 is most danceable.

    • duration_ms: The duration of the track in milliseconds. (And duration_s in seconds, rounded.)

    • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

    • instrumentalness: Predicts whether a track contains no vocals.

    • key: The key the track is in.

    • liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.

    • loudness: The overall loudness of a track in decibels (dB).

    • mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

    • speechiness: Speechiness detects the presence of spoken words in a track.

    • tempo: The overall estimated tempo of a track in beats per minute (BPM).

    • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

  • The figure below shows how these features are like in different genres.
Audio Feature Density Plot

Audio Feature Density Plot

  • The next three box plots are to find out the differences of music attributes between different Music Genres.

  • Firstly, the relationship between color and Music Genre is established, and put into the same tibble, call “COLORS”. This method allows different Music Genre to be clearly distinguished by different colors, and then the specific characteristic of each Music Genre can be judged from those box plots.

Average valence by Music Genre

Average valence by Music Genre

  • The first plot above is the relationship between Music Genre and Valence. It can be clearly seen from the plot that Latin has the highest value of Valence and EDM has the lowest value of Valence. This shows that Latin’s capacity of conveying the musical positiveness is more powerful, while EDM sounds more negative. The other four Music Genre have no obvious trend in this respect, which are almost between 0.3 and 0.7.
Average Energy by Music Genre

Average Energy by Music Genre

  • The second plot above describes the relationship between Music Genre and Energy. Energy is a measure from 0.0 to 1.0 and represents a conceptual measure of intensity and activity. It can be clearly seen from the plot that EDM has the highest value of Energy, while Rythm and Bass value of Energy is the lowest, which also shows the style of these two Music Genres. Mostly, EDM will make people feel energized, loud, and noisy when listening. However, R&B is mainly lyrical, slow and quiet, which bring less energy for the listeners. Similarly, Rock has always been famous for its flexible and bold expression and passionate music rhythm and its ranking is only inferior to EDM.

  • Finally, the above plot describes the relationship between Music Genres and Speechiness. Speechiness detects the presence of spoken words in a track. If more words or sentences are said in a song, the closer to 1.0 the attribute value. That attribute is very interesting, which indicates whether the artists tends to express ideas by describing the lyrics in music or writing the melody of music to express their feelings.

Average Speechiness by Music Genre

Average Speechiness by Music Genre

  • From the plot, there is no doubt that Rap is bound to occupy the first place, because the characteristic of Rap is to quickly tell a series of rhyming lyrics against the background of mechanical rhythmic sound. What is worth noting is that Rock and POP are the lowest, which shows that those two genres tend to use the melody or rhythm of music to affect the audience, rather than using the lyrics.

  • After describing the contents and internal relations of the three plots in detail, there are still many related attributes that have not been explored. The purpose of our group is to put up the most interesting parts together. If someone is interested, it is easy to continue and build upon the existing analyses.

Music Genre and their Popularity - by Decade of release date

  • After reviewing the internal relations between Audio Features and Music Genres, now we can discuss about the Music Genres in detail. The table below shows the distribution of each genre in this dataset. The most frequently appeared genre is “edm”, while the genre “rock” appeared least.
Genres in the dataset
playlist_genre n
edm 6043
rap 5746
pop 5507
r&b 5431
latin 5155
rock 4951

The following figure shows the average popularity of songs released in different time. To show the result clearly and for convenience of comparison, we divided the result for each genre.

Genre Popularity by Decade

Genre Popularity by Decade

  1. EDM music emerged in the 1970s, and its popularity is 40 or even less. This shows EDM music is not the mainstream music nowadays and is restricted to a smaller group.

  2. Latin and pop music have been popular since the 1960s. The 1970s was the golden time for latin songs, while the 1960s and 1970s were the golden time for pop music. These old songs are popular even today!

  3. R&B music went through ups and downs. The songs released from the 1980s to the 2000s are less popular than others.

  4. Rap music has been popular since the 1960s, and the oldest rap music is still the most popular ones. The songs released in the 2000s have the lowest popularity now.

  5. The popularity of rock music released in different time period are quite stable. While the ones released from the 1960s to the 1990s are more popular than the others.

Correlation between Popularity and Audio Features

Internal Relations between Audio Features

The correlation of song features is very helpful for us to explore the reasons for the popularity of music artworks. We can see from the correlation plot that the characteristics of each song are specific and unique, but we can summarize them with ten musical attributes. Meanwhile, there are three types of relation between different attributes: Negative correlation, positive correlation or completely irrelevant. This is very important for us to analyze the properties of music artworks in the future.

For example, if a song has a strong energy attribute, it must also have a high value of loudness, and the probability of not belonging to acoustic is also very high. If a person likes songs that are more active or have higher valence, he should explore some potential favorite songs of high danceability, high energy, and contains more vocal content. It is easy to see that the role of correlation plot is very meaningful. It can play an irreplaceable role in the analysis of songs or the selection of the favorite attributes of songs and the rest of effects can be explored later.

  • We can build upon the correlation plot displayed in Data Description. The plot below shows a numerical display of correlation just like the shades that was produced in the previous plot.
Correlation between Audio Features

Correlation between Audio Features

Relationship between Popularity and a certain Audio Feature

After describing the unique information about audio features, now we pay attention to exploring whether these audio features contribute to a higher popularity. First we plot each audio feature of the songs and the popularity in the following figure.

Popularity vs Audio Feature

Popularity vs Audio Feature

  • It shows that liveness has a negative relationship with popularity and we also find that there’s no absolute relationship between valence and popularity. A higher valence doesn’t necessarily make a song more popular.This is consistent with our sentiment analysis.

  • Also, We are not sure whether those above dot plots can directly reveal the relationship between these popularity and audio features. So we pay attention to exploring whether these audio features contribute to a higher popularity using a linear regression model just in case.

  • Here we filtered the songs with a popularity greater than 0, since 0 popularity value does not make sense in this model. And the following table shows all the audio features with a p-value less than 0.05. We can draw a conclusion that danceability and valence contribute most to a higher popularity.

  • Acousticness, key, loudness, mode and tempo also have positive relationship with popularity. While energy, instrumentalness, liveness and speechiness have negative relationship with popularity, with is similar with those dot plots conclusion.

lm (popularity ~ features)
term estimate std.error statistic p.value
(Intercept) 744.18 15.80 47.10 0.00
acousticness 18.36 6.85 2.68 0.01
danceability 35.67 9.91 3.60 0.00
duration_ms 0.00 0.00 -14.03 0.00
energy -253.44 11.32 -22.39 0.00
instrumentalness -109.25 6.07 -18.01 0.00
key 0.95 0.35 2.69 0.01
liveness -23.39 8.47 -2.76 0.01
loudness 14.09 0.61 23.07 0.00
mode 6.61 2.58 2.57 0.01
speechiness -43.00 12.83 -3.35 0.00
tempo 0.18 0.05 3.72 0.00
valence 37.22 6.08 6.12 0.00
  • Using geom_smooth, we can get a clear picture of how popularity is affected by different audio features. We can observe how each audio feature trends with increasing popularity,
Popularity vs Audio Feature using Smooth Curves

Popularity vs Audio Feature using Smooth Curves

  • We can observe that most of the audio features have almost no relation with popularity except for Energy and Instrumentalness which negatively affect popularity while Danceability positively affect popularity. This trend is observed for tracks that have a popularity greater than 50.

  • This leads us to extend the analysis to pursue danceabilty and check which music genre, and artists are in line with this trend. The next analysis pursues our questiton as to what the unique selling point for each selected artist is.

Unique Features of Artists

  • Now that we have analyzed about the correlation of different audio features, let’s explore how the artists are popular and exactly why they are popular. This involves analyzing common audio features in the songs of the top artists.

  • We choose the following artists who are regarded as one of the best in their genre:

    • Taylor Swift - Pop
    • Eminem - Rap
    • AC/DC - Rock
    • Shakira - Latin
    • Usher - R & B
    • David Guetta - EDM
  • We select Danceability, Speechiness , Energy and Valence as our audio features since these best describe the genres we have selected.

Taylor Swift (Pop Artist) Audio Features

  • First up, let’s see what Taylor’s pop songs are like.
Taylor Swift Audio Features

Taylor Swift Audio Features

Eminem (Rap Artist) Audio Features

  • Next up, it’s Eminem’s rap prowess!
Eminem Audio Features

Eminem Audio Features

AC/DC (Rock Artist) Audio Features

  • Next, it’s AC/DC’s rocking songs!
AC/DC Audio Features

AC/DC Audio Features

Shakira (Latin Artist) Audio Features

  • Let’s look at Shakira’s Latin songs!
Shakira Audio Features

Shakira Audio Features

Usher (Rock Artist) Audio Features

  • Let’s look at Usher’s Rythm and Bass!
Usher Audio Features

Usher Audio Features

David Guetta (EDM Artist) Audio Features

  • Finally, David’s Electronic Dance Music!
David Guetta Audio Features

David Guetta Audio Features

Mean Values for the Audio Features are as follows -

Danceabilty -

  • Taylor Swift - 0.6701429
  • Eminem - 0.7346154
  • AC/DC - 0.51084
  • Shakira - 0.7347
  • Usher - 0.7614706
  • David Guetta - 0.61415

Speechiness -

  • Taylor Swift - 0.07605
  • Eminem - 0.2390026
  • AC/DC - 0.094764
  • Shakira - 0.09703
  • Usher - 0.0963118
  • David Guetta - 0.078215

Energy -

  • Taylor Swift - 0.6827143
  • Eminem - 0.7578462
  • AC/DC - 0.82136
  • Shakira - 0.7417
  • Usher - 0.5531176
  • David Guetta - 0.8260667

Valence -

  • Taylor Swift - 0.6375
  • Eminem - 0.4266256
  • AC/DC - 0.53016
  • Shakira - 0.76085
  • Usher - 0.6858235
  • David Guetta - 0.3748767

Conclusion

References